Job 627 van 1000


Report this listing

Solliciteren



Freelance Senior Site Reliability Engineer - Long Term - Hybrid


Freelance Senior Site Reliability Engineer - Long Term - Hybrid


We are seeking a Senior Site Reliability Engineer to lead the design and operation of reliability, observability, and incident response capabilities across critical platforms and services.

This role combines deep hands-on Kubernetes expertise with advanced monitoring and incident management practices using Datadog and Jira. You will work closely with Product Engineering and Platform teams to ensure services are highly available, observable, and operationally mature.


Responsibilities


  • Design, build, and operate highly reliable, scalable distributed systems.
  • Define and manage SLIs, SLOs, and error budgets.
  • Conduct reliability reviews and resilience assessments.
  • Implement proactive reliability improvements and automation.
  • Design and implement end-to-end observability strategies.
  • Build and manage dashboards, monitors, and alerts in Datadog.
  • Instrument services using metrics, logs, traces, and APM.
  • Manage incident lifecycle using Jira (tracking, escalation, resolution).
  • Develop and maintain incident runbooks and response playbooks.
  • Deploy, operate, and scale production Kubernetes clusters.
  • Troubleshoot cluster, node, pod, and networking issues.
  • Manage ingress controllers, service discovery, and traffic routing.
  • Implement autoscaling, resource optimization, and workload resilience.


Additional Details:


  • START DATE: ASAP
  • DURATION: Long term
  • CONTRACT: Freelance
  • ONSITE POLICY: Liege (flexible remote working)
  • HOURS PER WEEK: Full time
  • LANGUAGES: English only
  • INTERVIEW PROCESS: One stage


If you are interested or know anyone who could be a good fit please send your CV to matthew@Kodastaff.com

Solliciteren

Meer banen van je zoekopdracht